177 research outputs found
Multi-Modal 3D Object Detection in Autonomous Driving: a Survey
In the past few years, we have witnessed rapid development of autonomous
driving. However, achieving full autonomy remains a daunting task due to the
complex and dynamic driving environment. As a result, self-driving cars are
equipped with a suite of sensors to conduct robust and accurate environment
perception. As the number and type of sensors keep increasing, combining them
for better perception is becoming a natural trend. So far, there has been no
indepth review that focuses on multi-sensor fusion based perception. To bridge
this gap and motivate future research, this survey devotes to review recent
fusion-based 3D detection deep learning models that leverage multiple sensor
data sources, especially cameras and LiDARs. In this survey, we first introduce
the background of popular sensors for autonomous cars, including their common
data representations as well as object detection networks developed for each
type of sensor data. Next, we discuss some popular datasets for multi-modal 3D
object detection, with a special focus on the sensor data included in each
dataset. Then we present in-depth reviews of recent multi-modal 3D detection
networks by considering the following three aspects of the fusion: fusion
location, fusion data representation, and fusion granularity. After a detailed
review, we discuss open challenges and point out possible solutions. We hope
that our detailed review can help researchers to embark investigations in the
area of multi-modal 3D object detection
OCC-VO: Dense Mapping via 3D Occupancy-Based Visual Odometry for Autonomous Driving
Visual Odometry (VO) plays a pivotal role in autonomous systems, with a
principal challenge being the lack of depth information in camera images. This
paper introduces OCC-VO, a novel framework that capitalizes on recent advances
in deep learning to transform 2D camera images into 3D semantic occupancy,
thereby circumventing the traditional need for concurrent estimation of ego
poses and landmark locations. Within this framework, we utilize the TPV-Former
to convert surround view cameras' images into 3D semantic occupancy. Addressing
the challenges presented by this transformation, we have specifically tailored
a pose estimation and mapping algorithm that incorporates Semantic Label
Filter, Dynamic Object Filter, and finally, utilizes Voxel PFilter for
maintaining a consistent global semantic map. Evaluations on the Occ3D-nuScenes
not only showcase a 20.6% improvement in Success Ratio and a 29.6% enhancement
in trajectory accuracy against ORB-SLAM3, but also emphasize our ability to
construct a comprehensive map. Our implementation is open-sourced and available
at: https://github.com/USTCLH/OCC-VO.Comment: 7pages, 3 figure
Mga Modulates Bmpr1a Activity by Antagonizing Bs69 in Zebrafish
MAX giant associated protein (MGA) is a dual transcriptional factor containing both T-box and bHLHzip DNA binding domains. In vitro studies have shown that MGA functions as a transcriptional repressor or activator to regulate transcription of promotors containing either E-box or T-box binding sites. BS69 (ZMYND11), a multidomain-containing (i.e., PHD, BROMO, PWWP, and MYND) protein, has been shown to selectively recognizes histone variant H3.3 lysine 36 trimethylation (H3.3K36me3), modulates RNA Polymerase II elongation, and functions as RNA splicing regulator. Mutations in MGA or BS69 have been linked to multiple cancers or neural developmental disorders. Here, by TALEN and CRISPR/Cas9-mediated loss of gene function assays, we show that zebrafish Mga and Bs69 are required to maintain proper Bmp signaling during early embryogenesis. We found that Mga protein localized in the cytoplasm modulates Bmpr1a activity by physical association with Zmynd11/Bs69. The Mynd domain of Bs69 specifically binds the kinase domain of Bmpr1a and interferes with its phosphorylation and activation of Smad1/5/8. Mga acts to antagonize Bs69 and facilitate the Bmp signaling pathway by disrupting the Bs69-Bmpr1a association. Functionally, Bmp signaling under control of Mga and Bs69 is required for properly specifying the ventral tailfin cell fate.</p
: Transferring Visual Representations for Reinforcement Learning via Prompting
It is important for deep reinforcement learning (DRL) algorithms to transfer
their learned policies to new environments that have different visual inputs.
In this paper, we introduce Prompt based Proximal Policy Optimization
(), a three-stage DRL algorithm that transfers visual representations
from a target to a source environment by applying prompting. The process of
consists of three stages: pre-training, prompting, and predicting. In
particular, we specify a prompt-transformer for representation conversion and
propose a two-step training process to train the prompt-transformer for the
target environment, while the rest of the DRL pipeline remains unchanged. We
implement and evaluate it on the OpenAI CarRacing video game. The
experimental results show that outperforms the state-of-the-art visual
transferring schemes. In particular, allows the learned policies to
perform well in environments with different visual inputs, which is much more
effective than retraining the policies in these environments.Comment: This paper has been accepted to be presented at the upcoming IEEE
International Conference on Multimedia & Expo (ICME) in 202
Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection
LiDAR and Radar are two complementary sensing approaches in that LiDAR
specializes in capturing an object's 3D shape while Radar provides longer
detection ranges as well as velocity hints. Though seemingly natural, how to
efficiently combine them for improved feature representation is still unclear.
The main challenge arises from that Radar data are extremely sparse and lack
height information. Therefore, directly integrating Radar features into
LiDAR-centric detection networks is not optimal. In this work, we introduce a
bi-directional LiDAR-Radar fusion framework, termed Bi-LRFusion, to tackle the
challenges and improve 3D detection for dynamic objects. Technically,
Bi-LRFusion involves two steps: first, it enriches Radar's local features by
learning important details from the LiDAR branch to alleviate the problems
caused by the absence of height information and extreme sparsity; second, it
combines LiDAR features with the enhanced Radar features in a unified
bird's-eye-view representation. We conduct extensive experiments on nuScenes
and ORR datasets, and show that our Bi-LRFusion achieves state-of-the-art
performance for detecting dynamic objects. Notably, Radar data in these two
datasets have different formats, which demonstrates the generalizability of our
method. Codes are available at https://github.com/JessieW0806/BiLRFusion.Comment: accepted by CVPR202
- …